Role of Verbs in Document Analysis
نویسندگان
چکیده
We present results of two methods for assessing the event profile of news articles as a function of verb type. The unique contribution of this research is the focus on the role of verbs, rather than nouns. Two algorithms are presented and evaluated, one of which is shown to accurately discriminate documents by type and semantic properties, i.e. the event profile. The initial method, using WordNet (Miller et al. 1990), produced multiple cross-classification of articles, primarily due to the bushy nature of the verb tree coupled with the sense disambiguation problem. Our second approach using English Verb Classes and Alternations (EVCA) Levin (1993) showed that monosemous categorization of the frequent verbs in WSJ made it possible to usefully discriminate documents. For example, our results show that articles in which communication verbs predominate tend to be opinion pieces, whereas articles with a high percentage of agreement verbs tend to be about mergers or legal cases. An evaluation is performed on the results using Kendall's ~-. We present convincing evidence for using verb semantic classes as a discriminant in document classification. 1 1 M o t i v a t i o n We present techniques to characterize document type and event by using semantic classification of verbs. The intuition motivating our research is illustrated by an examination of the role of 1The authors acknowledge earlier implementations by James Shaw, and very valuable discussion from Vasileios Hatzivassiloglou, Kathleen McKeown and Nina Wacholder. Partial funding for this project was provided by NSF award #IRI-9618797 STIMULATE: Generating Coherent Summaries of On-Line Documents: Combining Statistical and Symbolic Techniques (co-PI's McKeown and Klavans), and by the Columbia University Center for Research on Information Access. 680 nouns and verbs in documents. The listing below shows the ontological categories which express the fundamental conceptual components of propositions, using the framework of Jackendoff (1983). Each category permits the formation of a wh-question, e.g. for [THING] "what did you buy?" can be answered by the noun "a fish". The wh-questions for [ACTION] and [EVENT] can only be answered by verbal constructions, e.g. in the question "what did you do?", where the response must be a verb, e.g. jog, write, fall, etc. [TH,NG] [DmECT,ON] [ACTION] [eLAtE] [MANNER] [EVENT] [AMO,NT] The distinction in the ontological categories of nouns and verbs is reflected in information extraction systems. For example, given the noun phrases fares and US Air that occur within a particular article, the reader will know what the story is about, i.e. fares and US Air. However, the reader will not know the [EVENT], i.e. what happened to the fares or to US Air. Did airfare prices rise, fall or stabilize? These are the verbs most typically applicable to prices, and which embody the event. 1.1 F o c u s on t h e N o u n Many natural language analysis systems focus on nouns and noun phrases in order to identify information on who, what, and where. For example, in summarization, Barzilay and Elhadad (1997) and Lin and Hovy (1997) focus on multiword noun phrases. For information extraction tasks, such as the DARPA-sponsored Message Understanding Conferences (1992), only a few projects use verb phrases (events), e.g. Appelt et al. (1993), Lin (1993). In contrast, the named entity task, which identifies nouns and noun phrases, has generated numerous projects as evidenced by a host of papers in recent conferences, (e.g. Wacholder et al. 1997, Palmer and Day 1997, Neumann et al. 1997). Although rich information on nominal participants, actors, and other entities is provided, the named entity task provides no information on w h a t h a p p e n e d in the document, i.e. the e v e n t or ac t ion . Less progress has been made on ways to utilize verbal information efficiently. In earlier systems with stemming, many of the verbal and nominal forms were conflated, sometimes erroneously. With the development of more sophisticated tools, such as part of speech taggers, more accurate verb phrase identification is possible. We present in this paper an effective way to utilize verbal information for document type discrimination. 1.2 F o c u s on t h e V e r b Our initial observations suggested that both occurrence and distribution of verbs in news articles provide meaningful insights into both article type and content. Exploratory analysis of parsed Wall Street Journal data 2 suggested that articles characterized by movement verbs such as drop, plunge, or fall have a different event profile from articles with a high percentage of communication verbs, such as report, say, comment, or complain. However, without associated nominal arguments, it is impossible to know whether the [THING] that drops refers to airfare prices or projected earnings. In this paper, we assume that the set of verbs in a document, when considered as a whole, can be viewed as part of the conceptual map of the events and action in a document, in the same way that the set of nouns has been used as a concept map for entities. This paper reports on two methods using verbs to determine an event profile of the document, while also reliably categorizing documents by type. Intuitively, the event profile refers to the classification of an article by the kind of event. For example, the article could be a discussion event, a reporting event, or an argument event. To illustrate, consider a sample article from WSJ of average length (12 sentences in length) with a high percentage of communication verbs. The profile of the article shows that there are 19 verbs: 11 (57%) are communication verbs, including add, report, say, and tell. Other 2Penn TreeBank (Marcus et al. 1994) from the Linguistic Data Consortium. 681 verbs include be skeptical, carry, produce, and close. Representative nouns include Polaroid Corp., Michael Ellmann, Wertheim Schroder Co., Prudential-Bache, savings, operating "results, gain, revenue, cuts, profit, loss, sales, analyst, and spokesman. In this case, the verbs clearly contribute information that this article is a report with more opinions than new facts. The preponderance of communication verbs, coupled with proper noun subjects and human nouns (e.g. spokesman, analyst) suggest a discussion article. If verbs are ignored, this fact would be overlooked. Matches on frequent nouns like gain and loss do not discriminate this article from one which announces a gain or loss as breaking news; indeed, according to our results, a breaking news article would feature a higher percentage of motion verbs rather than verbs of com-
منابع مشابه
The role of Persian causative markers in the acquisition of English causative verbs
This project investigates the relationship between lexical semantics and causative morphology in the acquisition of causative/inchoative-related verbs in English as a foreign language by Iranian speakers. Results of translation and picture judgment task show although L2 learners have largely acquired the correct lexico-syntactic classification of verbs in English, they were constrained by ...
متن کاملINVESTIGATING THE ROLE OF CAUSATIVIZATION IN OVERPASSIVIZATION OF UN-ACCUSATIVE VERBS BY IRANIAN ENGLISH MAJORS
The current study aims at exploring the role of causativization as one of the causes stated in the literature for overpassivization of English unaccusatives in an Iranian context.The study was conducted using three data collection procedures, an Oxford Placement Test, a Grammaticality Judgment Task, and a Production Task. The results revealed that causativization errors with non-alternating una...
متن کاملThe Role of Conceptualizable Agent in Overpassivization of English Unaccusatives in Iranian English Majors
The present study is an attempt to explore the effect of one of the pragmatic elements of discourse (namely the conceptualizable agent) on overpassivization of English unaccusative verbs. Through employing the questionnaire originally used by Ju, (2000), 206 Iranian intermediate and advanced English majors were asked to choose the more grammatical form (active or passive) in target sentences wi...
متن کاملAnalysis of Citation Verbs in EFL Academic Writing: The Case Study of Dissertations and Theses at the University of Dar es Salaam, Tanzania
This study was an analytical account of EFL postgraduate learners’ use of verbs in citing other scholars in their own writing. Particular interest was differing extents of these verbs as categorised by Myer (1997), namely verbs representing statement of scholarly writing, verbs communicating knowledge of scholarly writing, and verbs denoting cognition of scholarly writing, each of which has sub...
متن کاملVerbs in Applied Linguistics Research Article Introductions: Semantic and syntactic analysis
This study aims to investigate the semantic and syntactic features of verbs used in the introduction section of Applied Linguistics research articles published in Iranian and international journals. A corpus of 20 research article introductions (10 from each journal) was used. The corpus was analysed for the syntactic features (tense, aspect and voice) and semantic meaning of verbs. The finding...
متن کاملThe Role of Verbs in Document Analysis
We present results of two methods for assessing the event profile of news articles as a function of verb type. The unique contribution of this research is the focus on the role of verbs, rather than nouns. Two algorithms are presented and evaluated, one of which is shown to accurately discriminate documents by type and semantic properties, i.e. the event profile. The initial method, using WordN...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998